The CCP4 suite: integrative software for macromolecular crystallography

This article describes the Collaborative Computational Project No. 4 (CCP4). It is intended as a general literature citation for the use of the CCP4 software suite in structure determination.

The Collaborative Computational Project No. 4 (CCP4) is a UK-led international collective with a mission to develop, test, distribute and promote software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs brought together by familiar execution routines, a set of common libraries and graphical interfaces. The CCP4 suite has experienced several considerable changes since its last reference article, involving new infrastructure, original programs and graphical interfaces. This article, which is intended as a general literature citation for the use of the CCP4 software suite in structure determination, will guide the reader through such transformations, offering a general overview of the new features and outlining future developments. As such, it aims to highlight the individual programs that comprise the suite and to provide the latest references to them for perusal by crystallographers around the world.

Introduction
As a technique, macromolecular crystallography (MX) relies heavily on computational methods, built on top of a strict set of conventions and common formats. Most conventions follow the lead of the International Union of Crystallography (IUCr), while MX software development is undertaken by both academic and private sector initiatives, such as the Phenix Consortium (Liebschner et al., 2019) and Global Phasing Ltd (Cambridge, United Kingdom). Based in the UK, MX software tools find a common distribution and maintenance channel under the umbrella of the Collaborative Computational Project No. 4, best known as CCP4. This consortium was established by the UK Science Research Council in 1979, almost 45 years ago, to facilitate the coordination and collaboration of MX software developers . Aside from coordinating and distributing software, CCP4 has a mission of promoting the teaching of MX, with an annual didactic CCP4 Study Weekend and numerous online and in-person annual workshops around the world. Forums, which originally took the shape of email lists -the CCP4 bulletin board (or CCP4bb) for general users' questions and ccp4-dev for developer discussions -are an evolving aspect of the CCP4 community, with social media taking a more prominent role in hosting other kinds of exchanges, for example paper or event announcements (Twitter: @ccp4_mx) or parallel discussions at conferences (Slack channels). The CCP4 website (https://www.ccp4.ac.uk) is the primary mechanism for reference and asynchronous communication but, most importantly, provides a central distribution point for software downloads. A minimal installer package can be obtained from the site, and this will proceed to install the latest version of the suite. Updates are then distributed via a nondisruptive mechanism that was first introduced with CCP4 version 6.3.0 in 2012. Update reminders are generated automatically, although the update mechanism itself is, by design, initiated manually. As an indication of update frequency, the 7.0 series, which was originally released in 2016, saw more than 70 updates until the 7.1 series was released in 2020. Updates are not a one-way road: they may be rolled back if problems are encountered. Whilst every effort has been made to keep the suite streamlined and maintainable, the inclusion of large databases and toolkits has driven space requirements steadily upwards (Fig. 1).
The last decade has seen some large transformations in the field of MX: new workflows have been created (for example phasing with AlphaFold2 models) and some old workflows have been optimized, while some others are on the verge of disappearing; this has often been the result of cross-pollination with other techniques in structural biology, for example electron cryo-microscopy (cryo-EM) in particular, through a synergistic collaboration with CCP-EM (Burnley et al., 2017), the Collaborative Computational Project for Cryo-EM, which repurposes some CCP4 code for the cryo-EM community. For example, owing to the deep-learning revolution in computational structure prediction (Jumper et al., 2021), it is now possible to phase most structures using large predicted fragments or, owing to the accuracy of the method, even to rigidbody fit an initial predicted model into electron density McCoy et al., 2022;Medina et al., 2022). As a side effect of the creation of these new workflows, experimental phasing is now losing importance in the everyday activities of an MX laboratory, with derivatives only being created as a last resort after all of the now conventional methods have failed. Data acquisition and processing, greatly bolstered by both software and hardware developments in situ at synchrotrons, is now performed almost instantaneously after data collection, presenting the user with the results of applying different processing strategies. Although seemingly unconnected, most of these newer developments have one thing in common: the Python programming language as a platform for pipelining and program communication.
While some Python scripts were already part of the CCP4 suite even before the time of the last general publication , most of the recent source code committed to the CCP4 repositories involves Python in one way or another; for example, both the data-integration tool DIALS  and its CCP4 graphical user interface DUI (Fuentes-Montero et al., 2016) are Python-heavy software. Other CCP4 programs, encoded in a different language such as C++ for performance reasons, may also offer Python bindings; examples include Coot (Emsley et al., 2010), Privateer (Agirre et al., 2015) and GEMMI , which is a crystallographic toolkit developed in collaboration with Global Phasing Ltd. Both the Python language and its interpreter are now at the core of the CCP4 suite. Importantly, both new graphical user interfaces to the CCP4 suite (see below) make substantial use of the Python language.
On the subject of graphical user interfaces, a large paradigm shift is also under way, with both CCP4i2 and CCP4 Cloud making extensive use of web technologies: HTML, CSS and JavaScript are used for both interface design and result presentation, with CCP4 Cloud making a strong case for the transformation of existing interactive model-building and illustration applications, for example Coot and CCP4mg, into apps that can be run within a web browser.

Figure 1
Evolution in the size of the CCP4 suite from version 4.2 (2002) through to version 8.0 (2022). Some representative programs included in the releases are highlighted in orange. The update mechanism (CCP4-um) was first used in version 6.3. New graphical interfaces were introduced in versions 7.0 (CCP4i2) and 7.1 (CCP4 Cloud). Coot and CCP4mg were originally distributed separately, but were bundled with the suite from version 6.5. For reference, the sizes of two popular contemporary storage devices are shown as dotted lines; please note that these were never targeted as distribution media.

Overview of the newest developments 2.1. Graphical user interfaces
The long-serving CCP4i interface (developed in Tcl/Tk) has recently been deprecated and replaced by a more modern, QT/PySide graphical user interface (GUI) named CCP4i2 (Potterton et al., 2018). The CCP4i2 GUI, the main purpose of which is to provide a desktop-based experience, has introduced a number of architectural differences with respect to the first iteration. (i) A real database system, as opposed to a directory structure, provides traceability of files and jobs, and allows the automatic population of inputs to follow-on jobs with outputs from previous jobs. (ii) Large MTZ files are separated into important column sets defining particular data types and with predictable names, for example Miller indices (H, K and L columns) plus amplitudes and estimated standard deviations or e.s.d.s (F and SIGF columns) define an 'Amplitudes' data type. (iii) Individual programs are wrapped in Python for their incorporation into tasks, which in many cases will be pipelines themselves; for example 'Data reduction' is a pipeline that involves use of the programs POINTLESS, AIMLESS, CTRUNCATE and FREER. (iv) Communication of results between individual programs is consolidated in structured data (XML) files. In addition, task reports aim to present only fundamental results and, where possible, provide expert diagnostics in a natural human-readable language, for example 'No evidence of possible translational noncrystallographic symmetry'. Other utilities include a multiplatform project import and export mechanism, instant job search by keywords, the use of task-specific key performance indicators, for example R work /R free , and context-dependent follow-on jobs with automatic selection of input files and default options. Outside the graphical user interface but very much within its infrastructure, the i2run module provides a command-line mechanism for running CCP4i2 pipelines, opening the door to batch processing using interface-level decision making.
CCP4 Cloud ) is a complete reimagination of what an interface should look like in the context of macromolecular crystallography. Technology-wise, it provides a server-side JavaScript implementation (based on Node.js) designed to work with high-performance computing (HPC) facilities (clusters and generic clouds) but which can also be run on a user's PC. This implementation also enables secure web access by a browser via HTML5, CSS and JavaScript (jQuery), and allows CCP4 Cloud to look consistent across different browsers and platforms, making it possible to run jobs and manage projects from, for example, mobile devices. The interface provides a general file-import function, which allows it to decide what kind of jobs can be run: for example, automated model building can only be performed if at least reflections and a sequence have been imported. The system features task interfaces for many CCP4 programs and some newly introduced pipelines. One such example is CCP4build, which combines Parrot for density modification (Cowtan, 2010), Buccaneer for model building (Cowtan, 2006), REFMAC for refinement , Coot for model editing (Emsley et al., 2010) and EDSTATS (Tickle, 2012) for model accuracy analysis; using these tools, CCP4build is able to make expert decisions depending on the phasing approach and model completeness. High-level progress indicators are available in both CCP4 Cloud and CCP4i2; one such example is the 'verdict' functionality, which provides a score for model completion and fit to the experimental data. CCP4i2 and CCP4 Cloud have a conceptually similar set of tasks, although their graphical presentation differs (Fig. 2).

Data processing
Developed in collaboration with Diamond Light Source and the Lawrence Berkeley National Laboratory, the DIALS project  is the CCP4 suite's main diffraction image processing toolkit; it is modular and hackable by design, so experienced crystallographers can tweak, extend or add new algorithms. Regardless of this specialist componentbased approach, complete DIALS workflows are provided in the xia2 pipeline (Winter, 2010), which incorporates expert decision making . More recently, a graphical user interface (DIALS User Interface or DUI) has also been introduced (Fuentes-Montero et al., 2016). The xia2 pipeline is run automatically at the end of data collections at Diamond Light Source (Oxfordshire, United Kingdom), providing the results of applying multiple data-processing strategies: users are expected to look at the metrics provided and decide which is better suited to their diffraction data set. Newcomer users wanting to learn more about DIALS are advised to use DUI, which provides a guided step-by-step execution of the whole process, although command-line use through simple scripts is designed to be accessible to the nonexpert user.
DIALS is able to natively process data obtained at X-ray free-electron laser (XFEL) facilities (Ginn et al., 2015;Uervirojnangkoorn et al., 2015) and supports multi-crystal scaling (Beilsten-Edmands et al., 2020) and analysis via xia2.multiplex (Gildea et al., 2022), serial crystallography Parkhurst, 2020) and electron diffraction such as that obtained with standard field emission gun (FEG) cryo-microscopes (Clabbers et al., 2018). Data from multiple crystals may be scaled and merged together with BLEND (Mylona et al., 2017). Ice rings and further pathologies in measured data can be identified by a separate standalone tool named AUSPEX, which provides visual and automatic diagnostics based on statistics (Thorn et al., 2017) and, more recently, machine learning (Nolte et al., 2022). Alternatively, the iMosflm software (Powell et al., 2017) provides an easy-to-use interface to the MOSFLM imageprocessing program; while the software is no longer under active development, it contains many useful features and remains popular with users.
Once the data have been processed, Laue group determination and data scaling and reduction can be performed directly with DIALS, although POINTLESS and AIMLESS are also offered as a fallback mechanism (Evans & Murshudov, 2013); indeed, the latter two programs form the basis of the CCP4i2 'data reduction' task. Further diagnostics research papers can be obtained by running CTRUNCATE, which was originally an implementation of French and Wilson's algorithm (French & Wilson, 1978), to obtain structure-factor amplitudes from intensities; it will scan data sets for signs of anisotropic diffraction, twinning and translational noncrystallographic symmetry ( Comparison of the new CCP4 graphical user interface offerings: (a) desktop (CCP4i2) and (b) online (CCP4 Cloud). The same pipeline (Crank-2) has been run on both interfaces. The reports show equivalent graphs due to the use of a compatibility layer that allows the same report code to run on both platforms. that could complicate or even compromise the downstream structure-determination process. This set of programs has graphical interfaces in both CCP4i2 and CCP4 Cloud, producing colour-coded reports that flag up potential problems. Importantly, detailed reports are generated whenever merged intensities or amplitudes are imported into the graphical interfaces, providing a sanity check and metadata tracking.

Phasing
The CCP4 suite provides software for all phasing methods, although they mainly fall within one of the following categories: molecular replacement (MR), ab initio phasing with ideal fragments (a special case of molecular replacement) and experimental phasing. In the coming years, and due to the recent improvement in protein structure-prediction methods, the line between the former two is expected to become blurred or even disappear.
2.3.1. Molecular replacement and ab initio phasing, including bioinformatics. While the ever-growing area of bioinformatics is outside the remit of CCP4, the search for suitable molecular-replacement templates is primarily driven by protein homology analysis and therefore exploits bioinformatics methods. Various third-party tools have been incorporated into the suite to give support to the CCP4 modelpreparation tools and automated structure-solution pipelines.
MrBUMP is an automated tool that will perform searches for templates and attempt molecular replacement with them, displaying comprehensive results that can be taken forward provided that the R factors are low enough. It can find structures of homologues using PHMMER (Eddy, 2011) or HHpred (Sö ding, 2005) and place them using either Phaser (McCoy et al., 2007) or MOLREP (Vagin & Teplyakov, 2010). The template search code of MrBUMP can also be harnessed interactively in CCP4mg, allowing users to create composite models and ensembles for subsequent MR searches; this tool can be accessed from both CCP4i2 and CCP4 Cloud. MrParse  provides a convenient visualization of potential search models from the PDB and databases of new generation models such as the AlphaFold Protein Structure Database (Varadi et al., 2022). Designed to slice predicted models as well as homologs into domains that may differ in relative orientation from the crystal structure, Slice'N'Dice (Simpkin, Elliott et al., 2022) is an automated molecular-replacement pipeline that facilitates the placement of these domains in molecular replacement. By processing and slicing the models, it simplifies the task of placing these domains. CCP4mg (McNicholas et al., 2011) can also be used to visualize the slicing of the input models.
Phaser uses a maximum-likelihood approach to the phasing problem; it is the only molecular-replacement software that uses intensities natively, i.e. without turning them into amplitudes first, and can also use SAD data (for SAD and MR-SAD phasing). The voyager (Sammito et al., 2019) automated procedure within Phaser presents a new architecture that allows more flexibility, guiding user decisions in creating ensembles. It also provides, alongside a plethora of new and reimplemented algorithms, code to make the best use of AlphaFold (Jumper et al., 2021) and RoseTTAFold (Baek et al., 2021) structure predictions, or high-confidence subsets of them, including the transformation of model confidence metrics (for example the AlphaFold pLDDT) into estimated B factors. Owing to the flexibility of the new design, tools for fitting models into cryo-EM maps have been included. An ad hoc graphical user interface is under development; this will allow easier navigation of the different solutions calculated during the search strategy, presenting the user with essential plots such as the self-rotation function.
CCP4 also has fragment-based ab initio phasing packages: ARCIMBOLDO (Rodríguez et al., 2009) and Fragon (Jenkins, 2018), which use ideal fragments of proteins (mainly helices) in targeted molecular-replacement searches. The use of these programs was initially confined to high-resolution data, but they have recently enjoyed success at resolutions lower than 2.3 Å , a threshold beyond which it becomes difficult to ascertain the direction of helical fragments, owing to their improved search strategies (Medina et al., 2022), phase combination (Millá n et al., 2020) and the use of available structural information, including AlphaFold predictions. ARCIMBOLDO (Rodríguez et al., 2009) can use fragments of homologous models and phase previously intractable coiledcoil structures (Caballero et al., 2018). It should be noted that part of the success of these methods is down to the ability of Phaser to place single amino acids or even atoms with great accuracy (McCoy et al., 2017) and the ability of the densitymodification and autotracing algorithms in SHELXE (Usó n & Sheldrick, 2018) to bootstrap solutions from poor starting phase sets with average errors as high as 70 (Millá n et al., 2015). Also in alternative MR territory is AMPLE (Bibby et al., 2012), which majors on editing search-model ensembles, particularly ab initio predictions and distant homologues.
SIMBAD (Simpkin et al., , 2020 provides a sequenceindependent phasing pipeline that may be used for phasing crystals of unknown contaminants . Other MR pipelines use larger fragments or domains as their source of phasing information: BALBES (Long et al., 2008) and MoRDA (Vagin & Lebedev, 2015) are automated pipelines that use MOLREP to place matches from curated databases containing fragments, domains and homo-and hetero-oligomers. Dimple (Wojdyr et al., 2013) is an automated procedure that aims to quickly arrive at a solved structure of a protein-ligand complex starting from an isomorphous crystal; the software will phase the data and produce preliminary maps, including a difference density map where omit density for a ligand might be found.
2.3.2. Experimental phasing. The steady increase in unique new domains deposited every year in the PDB, the availability of millions of predicted models in the AlphaFold Protein research papers Structure Database (Varadi et al., 2022) and the continuous improvement of fragment-based ab initio phasing methods mean that experimental phasing is increasingly becoming a last-resort approach to recovering phases; it also means that software will have to deal with the most difficult cases. New since the time of the last CCP4 general publication  is the inclusion of the SHELXC/D/E (Sheldrick, 2008) programs, which can be run individually or in a pipeline through the Crank-2 (Skubá k & Pannu, 2013) frontend, which is available in both the CCP4i2 and CCP4 Cloud interfaces. Crank-2 itself incorporates a number of different algorithms that can deal with SAD, SIRAS, MAD and MR-SAD. As stated in the previous section, the Phaser software (McCoy et al., 2007) is also able to perform both SAD and MR-SAD phasing.
2.4. Model building and refinement 2.4.1. Interactive model building. The CCP4 suite ships with the de facto industry-standard interactive model-building program Coot (Emsley et al., 2010). After two decades under constant development, the Coot software package has now reached version 1.0, which incorporates a major rework of the graphical architecture, interface, tools and components of the program. Aside from all of the well known tools for manual model building, the software has a built-in ligand building tool Lidia, which can use AceDRG (see below) for restraint generation, the ability to create covalent linkages between protein and ligand or between molecular components , a semi-automatic N-glycan building tool, which is able to build entire oligosaccharides that are consistent with the most common biosynthetic pathways (Emsley & Crispin, 2018), a real-space, accelerated refinement tool that is able to process whole macromolecules, in contrast to the manual localized real-space refinement that users typically perform when fitting or tweaking parts of a model (Casañ al et al., 2020), and validation tools that run the most common checks on protein models (Ramachandran plots, rotamer propensities, planarity of the peptide bond, perresidue B factors and density-fit analysis, amongst others), plus tools to facilitate ligand fitting (Nicholls, 2017) and validation , for example deviation from ideal geometry values in dictionaries, clashes and interaction maps. Coot makes use of the CCP4 Monomer Library to obtain restraints for the most common biomolecule monomers (amino acids, carbohydrates, nucleic acids) and most ligands defined in the PDB Chemical Component Dictionary (Westbrook et al., 2015).
At present, Coot is tied to desktop machines due to its reliance on the GTK toolkit (Emsley et al., 2010). This means that users of CCP4 Cloud  need to have a local installation of the CCP4 suite in order to perform manual model building. However, there is an ongoing effort to produce a web-based interface, which will use the Coot engine in the same manner that the GTK version does but without requiring a local CCP4 installation.
2.4.2. Automated model building. While Coot has incrementally added a wealth of automatic procedures over the years, the CCP4 suite includes several fully automated pipelines that combine automated model-building software [Buccaneer (Cowtan, 2006) and Nautilus (Cowtan, 2014), ARP/wARP 8.0 (Lamzin et al., 2012) or the chain-tracing code in SHELXE (Usó n & Sheldrick, 2018)] with reciprocal-space refinement (see Section 2.4.4) and validation [EDSTATS (Tickle, 2012) and MolProbity (Williams et al., 2018)] to produce protein and nucleic acid models that are completed iteratively. These pipelines, for example Modelcraft (Bond & Cowtan, 2022) in CCP4i2 and CCP4build in CCP4 Cloud, are available from both modern graphical user interfaces (CCP4i2 and CCP4 Cloud) and are completed by either graphical or textual summaries of the completeness of the built model. Outside the protein realm, AlphaFold (Jumper et al., 2021) and RoseTTAfold (Baek et al., 2021) models can be glycosylated using the glycan library and tools in the Privateer software (Bagdonas et al., 2021). PanDDA (Pearce et al., 2017) allows users to increase the signal-to-noise ratio of their ligand maps by combining several data sets from ligand-free and ligand-bound forms of the protein; the program has algorithms for combining different crystal forms. The current automated model-building offerings in the suite are completed by ARP/ wARP 8.0 (Lamzin et al., 2012), which was jointly released with CCP4 version 7.0 for the first time in 2018; this software pioneered the iterative combination of model building and refinement (Perrakis et al., 1999), a feature that is now present in all modern model-building pipelines, and the automated addition of ligands (Langer et al., 2008). Modern versions of ARP/wARP may also be used with cryo-EM data (Chojnowski et al., 2021). At a higher level, the PDB-REDO pipeline has been integrated into CCP4 through graphical interfaces in CCP4i2 and CCP4 Cloud, with API calls to the PDB-REDO web server (Joosten et al., 2014).

Restraint dictionaries: the CCP4 Monomer Library.
The dictionaries in the CCP4 Monomer Library (Vagin et al., 2004) have been improved by the introduction of AceDRG (Long et al., 2017), which since version 7.0 of the suite can also generate restraint dictionaries for covalent linkages . New dictionaries are now routinely generated for many compounds, although pyranose sugars have received a separate treatment to account for their conformational preferences (Atanasova et al., 2022;Joosten et al., 2022). H atoms have been modelled and restrained in their nuclear positions in the CCP4 Monomer Library , as informed by neutron diffraction data (Allen & Bruno, 2010).
2.4.4. Refinement. The main tool for full-model reciprocalspace refinement in CCP4 is REFMAC5 . The program uses the sparse-matrix approximation of the Fisher's information matrix (Steiner et al., 2003) and is designed to be fast and flexible, with a number of refinement methods built into the engine, including restrained, unrestrained and rigid-body refinement. Jelly-body restraints are particularly useful for stabilizing refinement, for example, after molecular replacement, where larger parts of a structure might need to move into place. In addition to controlling model parameterization and performing macromolecular research papers Acta Cryst. (2023). D79, 449-461 refinement, REFMAC5 also performs map calculation. A variety of types of weighted maps are produced, which allow visualization, subsequent analyses and validation.
REFMAC5 allows the addition of case-specific structural knowledge to be utilized during refinement through the external restraints mechanism (Nicholls et al., 2012;Kovalevskiy et al., 2018). These external restraints, which are most useful when only low-resolution data are available, can for instance be generated by ProSMART (Nicholls et al., 2014) for proteins and nucleic acids using homologues or backbone hydrogen-bonding patterns, LibG (Brown et al., 2015) for nucleic acid base-pairing and stacking, and Platonyzer (Touw et al., 2016) for zinc, sodium and magnesium sites. The automated pipeline LORESTR (Kovalevskiy et al., 2016) can be used to optimize the refinement protocol at low resolution, expediting the process and easing manual user effort. New developments and the next generation of structure-refinement tools are being implemented in Servalcat utilizing the GEMMI library (Yamashita et al., 2021(Yamashita et al., , 2023. The PAIREF program (Malý et al., 2020), which has recently been introduced into CCP4i2, performs automatic paired refinement (Karplus & Diederichs, 2012) using the REFMAC5 refinement engine. It analyses the impact of weak reflections beyond the traditional high-resolution diffractionlimit cutoff on the quality of the refined model. The program monitors model and data indicators and model-to-data agreement metrics and implements a decision-suggesting routine for the high-resolution cutoff that may result in the best model. Outside REFMAC5 and associated tools, the SHEETBEND software (Cowtan et al., 2020) allows a very fast preliminary refinement of the atomic coordinates and, optionally, isotropic or anisotropic B factors . It is based on a novel approach in which a shift field, and not atoms, is refined to update and morph models. This approach is particularly indicated to correct large shifts in secondary-structure elements after molecular replacement and is run by default as part of the Modelcraft pipeline (Bond & Cowtan, 2022).

Validation and deposition
Both the CCP4i2 and CCP4 Cloud interfaces include a validation and deposition interface developed in collaboration with the PDBe (the Protein Data Bank in Europe; wwPDB Consortium, 2019; Armstrong et al., 2020). The purpose of this tool is to prepare mmCIF files for deposition; additionally, it provides the convenience of letting users see what their preliminary wwPDB validation report (Gore et al., 2012(Gore et al., , 2017 would look like and allowing them to fix errors and notice interesting chemical features of a model before going through the actual deposition process. Also, in preparation for deposition, the model and structure factors are converted into an mmCIF, which in turn allows the wwPDB to pre-populate many of the required metadata for deposition, such as refinement statistics. Further validation tools exist in CCP4 outside this online validation process. Protein model validation can be performed with a variety of tools. MolProbity analyses backbone geometry, rotamers and clashes, and produces a script file that will generate a menu within Coot containing lists of outliers. Coot itself contains a plethora of interactive and live-updated validation tools, ranging from MolProbity-equivalent metrics to other less frequently quoted metrics, for example the Kleywegt Plot, which can be of great value depending on the problem. The EDSTATS software (Tickle, 2012) provides a unique analysis of model-to-data fit, separating results by main chain and side chain and looking at difference density, with the results being able to point out common modelling problems, such as poorly fitting regions requiring a peptide flip. Version 8.0 of CCP4 has seen the gradual inclusion of PDB-REDO (Joosten et al., 2012) functionality into the CCP4 interfaces; for example Tortoize (Sobolev et al., 2020), a tool that analyses main-chain and side-chain geometry and reports Z-scores for every amino acid, is now integrated into the CCP4 validation tasks. The visual output of PDB-REDO calculations is displayed consistently across CCP4i2, CCP4-Cloud and the PDB-REDO website by encapsulating various interactive plots and tables in a self-contained single web component. Detection of errors, particularly sequence-register errors, by analysing the agreement between observed contacts and interresidue distances with the predictions from software such as AlphaFold2 (Sá nchez Rodríguez et al., 2022) is available in ConKit (Simkovic et al., 2017). The findMySequence software  uses machine learning for the identification of unknown proteins in X-ray crystallography and cryo-EM data, with the added benefit of detecting elusive register errors, which may have a detrimental effect on the quality of the rest of the structure. The Iris validation framework (Rochira & Agirre, 2021) is a standalone tool that displays a variety of validation metrics as concentric circles, with modelling errors becoming visible as ripples in successive circles. Carbohydrate model validation, including protein glycosylation, can be carried out with the Privateer software (Agirre et al., 2015), which in the MKIV version incorporates checks of glycan composition against offline mirrors of several glycomics databases (Bagdonas et al., 2020) and overall glycan conformation using Z-scores (Dialpuri et al., 2023). Specific structural radiation-damage sites in structures derived from cryocooled crystals can be identified with RABDAM through the B damage (Shelley et al., 2018) and B net (Shelley & Garman, 2022) metrics, and space-group and origin ambiguity may be determined and resolved using Zanuda (Lebedev & Isupov, 2014).

Analysis and representation
PISA (Krissinel & Henrick, 2007) allows the analysis of molecular interfaces, calculating likely assemblies, intramolecular and intermolecular contacts, and accessible areas, offering insight into crystal packing. Intramolecular (predicted) contact maps and other related representations can be visualized with ConKit (Simkovic et al., 2017) or online at the ConPlot server (Sá nchez Rodríguez et al., 2021).
On the representation side, the main tool in CCP4 is the CCP4 Molecular Graphics project (CCP4mg). Since the last research papers  . Some newer representations from CCP4mg can be seen in Fig. 3.

Under the bonnet
The dxtbx toolkit for DIALS (Parkhurst et al., 2014) is included as part of the cctbx (Grosse- Kunstleve et al., 2002) distribution; the clipper-python module , a SWIG wrapper around the original C++ Clipper library, is also included and supports a number of functions of the CCP4i2 interface, including the Iris validation framework (Rochira & Agirre, 2021). At a higher level, CCP4i2 (Potterton et al., 2018) provides code reusability via the command line, offering a mechanism for executing Pythononly pipelines without a running instance of the graphical user interface (headless mode). CCP4 Cloud projects and automatic structure-solution workflows can also be initiated from the command line using the 'cloudrun' utility; this is useful for performing serial computations for selected targets. The Coot model-building software (Emsley & Cowtan, 2004), originally conceived as a C++ object-oriented toolkit, is now exposed as an importable Python module to allow code reuse in new applications, and is also able to run in headless mode, suppressing all graphical output. Finally, CCP4mg (McNicholas et al., 2011) is also able to run without graphics, generating images from a scene-description file in XML format; this functionality is used in CCP4i2 to generate molecular graphics of, for instance, autobuilt structures.

Future plans
The transition towards web technologies, which is already under way with the introduction of CCP4 Cloud, will be A collection of newer representations included in the CCP4 Molecular Graphics project (CCP4mg). (a) PDB entry 2bn3 is a high-resolution model of insulin (Nanao et al., 2005); it is shown here as worms, with water molecules drawn as ellipsoids, both coloured and scaled by the anisotropic B factors of the model. (b) PDB entry 3v8x (Noinaj et al., 2012) is a structure of human transferrin (chain B), drawn here as a solvent-accessible surface with N-glycans shown as Glycoblocks (McNicholas & Agirre, 2017). (c) PDB entry 3c02, a structure of aquaglyceroporin from Plasmodium falciparum (Newby et al., 2008), embedded in a lipid bilayer by CHARMM-GUI (Jo et al., 2008); lipids are shown as cartoons.
completed in the near future by the introduction of fully fledged model-building, visualization and figure-preparation web-browser interfaces to the existing Coot and CCP4mg engines. We also foresee an increase in the number of connections to theoretical modelling packages such as AlphaFold (Jumper et al., 2021) and RoseTTAfold (Baek et al., 2021), as well as deeper harnessing of the AlphaFold Protein Structure Database (Varadi et al., 2022).

Software availability and data-access statement
The CCP4 software suite can be obtained from https:// www.ccp4.ac.uk/download. CCP4 maintains a public instance of CCP4 Cloud at https://cloud.ccp4.ac.uk available to both academic and licenced commercial users. No data were generated in the context of the present publication.